Speech Recognition on MPEG/Audio Encoded Files

نویسندگان

  • Lawrence Yapp
  • Gregory L. Zick
چکیده

A technique to peform speech recognition directly from audio files encoded using the MPEG/Audio coding standard is described. The technique works in the compressed domain and does not require the MPEG/Audio file to be decompressed. Only the encoded subband samples are extracted and processed for training and recognition. The underlying speech recognition engine used is based on the Hidden Markov model. The technique is applicable to layers I and II of MPEG/Audio, and training under one layer can be used to recognize the other. Results based on the recognition of a speaker-dependent, small vocabulary, and continuously spoken sentences shows accuracy as high as 99% using this technique.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Environment recognition for digital audio forensics using MPEG-7 and mel cepstral features

Environment recognition from digital audio for forensics application is a growing area of interest. However, compared to other branches of audio forensics, it is a less researched one. Especially less attention has been given to detect environment from files where foreground speech is present, which is a forensics scenario. In this paper, we perform several experiments focusing on the problems ...

متن کامل

Speech-Music Discrimination from MPEG-1 Bitstream

This paper describes a proposed algorithm for speech/music discrimination, which works on data directly taken from MPEG encoded bitstream thus avoiding the computationally difficult decoding-encoding process. The method is based on thresholding of features derived from the modulation envelope of the frequency-limited audio signal. The discriminator is tested on more than 2 hours of audio data, ...

متن کامل

Handling large audio files in audio books for building synthetic voices

One of the issues in using audio books for building a synthetic voice is the segmentation of large audio files. The use of standard forced-alignment to obtain phone boundaries on large audio files fails primarily because of huge memory requirements. Earlier works have attempted to resolve this problem by using large vocabulary speech recognition system employing restricted dictionary and langua...

متن کامل

The effect of speech and audio compression on speech recognition performance

This paper proposes an in-depth look at the influence of different speech and audio codecs on the performance of our continuous speech recognition engine. GSM full rate, G711, G723.1 and MPEG coders are investigated. It is shown that MPEG transcoding degrades the speech recognition performance for low bitrates whereas performance remains acceptable for specialized speech coders like GSM or G711...

متن کامل

Audio Environment Recognition using Zero Crossing Features and MPEG - 7 Descriptors

Problem statement: This study investigated zero crossing features and selected MPEG-7 audio descriptors for environment sound recognition applications such as audio forensics. Approach: The study implemented several experiments focusing on the problems of environment recognition from audio particularly for forensic applications. Results: It was investigated the effect of the temporal zero cross...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997